User comments on YouTube videos have also increased exponentially as a result of the platform\'s rapid expansion. Although manually analyzing these comments can be time-consuming and challenging for content creators, they serve as a source of feedback and user engagement for that video. A method of machine learning called \"sentiment analysis\" can be used to categorize the comments\' sentiment. The effectiveness of sentiment analysis in analyzing YouTube comments can be investigated in this study. It gathered a sizable set of comments from well-known YouTube videos, sentimentally annotated them, and fed it to various machine learning models for classification. Our findings show that YouTube comments can be accurately categorized as positive, negative, or neutral using sentiment analysis, providing valuable insights into how viewers feel about the videos and the subjects they cover
Introduction
Google acquired YouTube in 2006 due to its rapid popularity.
Today, 300+ hours of video are uploaded every minute, with nearly 5 billion daily views.
YouTube’s original slogan “Broadcast Yourself” empowered user-generated content.
Monetization began in 2007, enabling influencers and brands to profit through sponsorships.
The platform transformed marketing by enabling brand-sponsored content.
The thesis aims to understand YouTube’s role in the marketplace through literature review and research on user behavior (millennials and businesses), offering insights for advertisers and platform developers.
Quality of YouTube Comments
YouTube comments often have a negative reputation, seen as low in value.
Studies show comment quality affects video likes/dislikes.
Machine learning classifiers can predict valuable comments, filtering out discriminatory or low-quality ones.
Sentiment Analysis on Comments
Few studies predict video popularity from comment sentiment.
Prior research (e.g., on anorexia-related videos) used sentiment analysis to show anti-anorexia content had more positive feedback and likes.
This study attempts to find correlations between comment sentiment and video popularity.
Data Collection and Processing
Over 7 million English-language comments were collected from 3,000+ videos using Python and YouTube APIs.
The process involved keyword-based video searches (e.g., "Obama," "Federer") and extraction of comment metadata (text, timestamp, author).
Only UTF-8 encoded English comments were used to simplify analysis.
Machine Learning Classification
Comments were vectorized (converted from text to numerical format) for sentiment classification.
Techniques like Bag-of-Words (BoW) and decision trees were used to analyze and classify sentiment.
Tools like Weka were employed for implementing algorithms like Naive Bayes and K-Nearest Neighbors (KNN).
Comment Summarization
Text summarization helps condense long comments into key points.
Two types:
Extractive: Selects key phrases directly from the original text.
Abstractive: Rewrites content using new language to capture core ideas.
Conclusion
There is some correlation between the percentage of likes and percent-age of positive comments on YouTube. Though, since the variation is very high it is not possible using the comment sentiment alone to ac-curately predict thepercentage of likes using our method. The answer to the research question, Can the comments on a YouTube video be used to determine what ratio of the viewers liked or disliked the video using senti-ment analysis? is difficult to answer using only our results. The method needs to improve in order to draw any substantial conclusions. Most importantlydoesthetrainingdataneedto improve,irrelevantcom-mentsbesortedoutandtheamount ofvideostoanalyze increase. If these areas are improved upon, a conclusive answer to our research question could be found.
References
[1] T.C.Alberto,V.L.JohannesandA.A.Tiago,“Tubespam:Commentspamfilteringonyoutube”,IEEE14thInternationalConferenceonMachine Learning and Applications (ICMLA) Miami, FL, USA, pp. 138-143, 2024.
[2] HammadAfzal,RobertStevens,andGoranNenadic,\"Towardssemanticannotationofbioinformaticsservices:buildingacontrolledvocabulary\", Third International Symposium on Semantic Mining in Biomedicine, pp. 5-12,2024.
[3] Tiago A. Almeidaa,TiagoP. Silvaa, IgorSantosb, Jos?eM. G?omezHidalgoca, \"Text Normalization and Semantic Indexing to Enhance InstantMessaging and SMS Spam Filtering\", Knowledge-Based Systems, Vol. 108, pp. 25-32,2024.
[4] ZakiaZaman,SadiaSharmin,\"SpamDetectioninSocialMediaEmployingMachineLearningToolforTextMining\"13tnInternationalConference On Signal Image Technology & internet based system (SITIS) 2023.
[5] ElizabethPoche?,NishantJha,GrantWilliams,JazmineStaten,MilesVesper,AnasMahmoud\"AnalyzingUserCommentsonYouTubeCoding Tutorial Videos\", IEEE 25th International Conference on Program Comprehension (ICPC), 2023.
[6] ShreyasAiyara,NishaPShettyb\"N-GramAssistedYoutubeSpamCommentDetection\"InternationalConferenceonComputationalIntelligence and Data Science (ICCIDS), 2023
[7] ArifMehmood,Byung-WonOn,IngyuLee,ImranAshraf,GyuSangChoi\"Spamcommentspredictionusingstackingwithensemblelearning\", 10th International Conference on Computer and Electrical Engineering, 2023.
[8] Alper Kursat Uysal \"Feature Selection for Comment Spam Filtering on YouTube\", 10th International Conference on Computer and ElectricalEngineering, 2022.
[9] A. O. Abdullah, M. A. Ali, M. Karabatak, and A. Sengur, \"A comparative analysis of common YouTube comment spam filtering techniques\",Digital Forensic and Security (ISDFS), 6th International Symposium on, 2022, pp. 1-5: IEEE.
[10] M. Carlisle, “Using YouTube to enhance student class preparation in an introductory java course,” Proceedings of the 41st ACM TechnicalSymposium on Computer Science Education, pp. 470–474, 2022.
[11] C. Meda, F. Bisio, P. Gastaldo, and R. Zunino, \"A machine learning approach for Twitter spammers detection,\" International CarnahanConference on Security Technology (ICCST), pp. 13-16, Oct. 2022.
[12] C. Alberto, Tulio and Lochter, Johannes and Almeida, Tiago (2021). ”TubeSpam: Comment Spam Filtering on YouTube.” 138-143.10.1109/ICMLA.2015.37.
[13] Stefan Siersdorfer and Sergiu Chelaru, “How useful are your comments?: analyzing and predicting youtube comments and comment ratings”.In Proceedings of the 19th international conference on World wide web, 2010, pp. 891-900.
[14] A. Ammari, et al., “Identifyingrelevant youtubecommentstoderivesociallyaugmented usermodels: a semanticallyenriched machinelearningapproach,” Book Identifying relevant youtube comments to derive socially augmented user models: a semantically enriched machine learningapproach, Series Identifying relevant youtube comments to derive socially augmented user models: a semantically enriched machine learningapproach, ed., Editor ed.^eds., Springer-Verlag, 2012, pp. 71- 85.
[15] R. Chowdury, M. Monsur Adnan, G. Mahmud, and R. Rahman, “A data mining based spamdetection system for youtube,” in Digital Informa-tion Management (ICDIM), 2013 Eighth International Conference on, Sept 2013, pp. 373–378.
[16] Serbanoiu, A., Rebedea T., “Relevance-Based Ranking of Video Comments on YouTube”. In CSCS ’13 Proceedings of the 2013 19thInternational Conference on Control Systems and Computer Science, 2013, Washington, USA, pp.225-231.
[17] C. Ra?dulescu, M. Dinsoreanu, and R. Potolea, \"Identification of spam comments using natural language processing techniques,\" in IntelligentComputer Communication and Processing (ICCP), 2014 IEEE International Conference on, 2014, pp. 29-35: IEEE.